Data preparation

The data files were merged and restructured:

Does this make sense? We could try to compare different types of policies by taking the metadata from our data collection.

Sampling methods

Include table of how we determined the number of universities to sample per country.

Sample overview

Indicators by country

Alternative variant of tile plot

Further tile plot according to ranking position

Indicators overall

Number of criteria per country

country n_unis n_criteria criteria_found proportion_of_all_criteria
Austria 6 17 29 28%
Brazil 12 17 61 30%
Germany 12 17 64 31%
United Kingdom 24 17 109 27%
India 12 17 33 16%
Portugal 6 17 27 26%
United States 35 17 106 18%

Problem with the above figure: only 6 data points per Y-Val (code), therefore boxplot might obscure this. Maybe should show this. Maybe also just to vertical bars for each country.

Detailed table

The following figure depicts the same information as above but in a different way that is easier to read directly (if one wants to know the exact number of universities that mention a specific indicator).

The same information displayed along countries.

Correlation of indicators

Do the same only for the US only.

Citation ranking vs citation policy

## Joining, by = c("country", "university", "level", "status")

There is not much difference here.

Correlate rankings with indicators

## Joining, by = c("country", "university", "level", "status")

Conclusions:

Display significance levels (.05), although they are probably not meaningful given the non-random sample. P values were adjusted using the Benjamini, Hochberg, and Yekutieli methods to control the false discovery rate.

Now, do the correlation only for US

## Joining, by = c("country", "university", "level", "status")
## Warning in cor(., use = "pairwise.complete.obs"): the standard deviation is zero

Conclusions:

## Joining, by = c("country", "university", "level", "status")

Principal component analysis

## Warning in par(initial_par): graphical parameter "cin" cannot be set
## Warning in par(initial_par): graphical parameter "cra" cannot be set
## Warning in par(initial_par): graphical parameter "csi" cannot be set
## Warning in par(initial_par): graphical parameter "cxy" cannot be set
## Warning in par(initial_par): graphical parameter "din" cannot be set
## Warning in par(initial_par): graphical parameter "page" cannot be set

## Principal Components Analysis
## Call: psych::principal(r = x, nfactors = n, rotate = rotate)
## Standardized loadings (pattern matrix) based upon correlation matrix
##                               item   RC2   RC1   RC3   RC4   h2   u2 com
## Gender of reviewers              8  0.93                   0.88 0.12 1.0
## Gender equality                  7  0.89                   0.82 0.18 1.1
## Gender balance of reviewers      6  0.86                   0.75 0.25 1.0
## Number of publications          10  0.37                   0.14 0.86 1.1
## Engagement with policy makers    4        0.81             0.66 0.34 1.0
## Engagement with the public       5        0.74             0.55 0.45 1.0
## Engagement with industry         3        0.74             0.57 0.43 1.1
## Service to profession           14 -0.31  0.56             0.44 0.56 1.8
## Review & editorial activities   13        0.50  0.30       0.38 0.62 2.1
## Patents                         11              0.75  0.27 0.64 0.36 1.3
## Software                        15        0.24  0.70       0.56 0.44 1.3
## Publication quality             12        0.42 -0.48  0.34 0.54 0.46 2.9
## Citizen science                  2        0.31  0.40       0.26 0.74 1.9
## Citations                        1                    0.80 0.68 0.32 1.1
## Journal metrics                  9              0.31  0.69 0.59 0.41 1.5
## 
##                        RC2  RC1  RC3  RC4
## SS loadings           2.69 2.67 1.72 1.39
## Proportion Var        0.18 0.18 0.11 0.09
## Cumulative Var        0.18 0.36 0.47 0.56
## Proportion Explained  0.32 0.31 0.20 0.16
## Cumulative Proportion 0.32 0.63 0.84 1.00
## 
## Mean item complexity =  1.4
## Test of the hypothesis that 4 components are sufficient.
## 
## The root mean square of the residuals (RMSR) is  0.08 
##  with the empirical chi square  151.77  with prob <  6.1e-12 
## 
## Fit based upon off diagonal values = 0.85

Maybe doing a correspondence analysis could help? This could help visualising the initial figure (tile plot). However, one must be careful since the sample sizes are not equal among countries. Does that matter? Maybe to do a correspondence analysis of all vars vs all vars, to see how they interrelate (as an alternative to the PCA, which might be debatable given the binary data).

Countries on alternative indicators

Variables to collate: Data, OA, Citizen Science, Software, Gender equality, three forms of engagement.

country_name mean sd se upper lower Mean Lower Upper
Austria 1.3333333 1.5055453 0.6146363 1.9479696 0.7186970 1.3333333 0.3333333 2.500000
Brazil 1.9166667 1.3789544 0.3980698 2.3147365 1.5185968 1.9166667 1.2500000 2.666667
Germany 1.2500000 1.2154311 0.3508647 1.6008647 0.8991353 1.2500000 0.5833333 1.916667
India 0.3333333 0.6513389 0.1880254 0.5213587 0.1453080 0.3333333 0.0000000 0.750000
Portugal 1.8333333 0.4082483 0.1666667 2.0000000 1.6666667 1.8333333 1.5000000 2.000000
United Kingdom 1.7916667 1.2846643 0.2622310 2.0538977 1.5294357 1.7916667 1.2916667 2.291667
United States 0.6857143 1.2071217 0.2040408 0.8897551 0.4816735 0.6857143 0.3142857 1.085714

## Joining, by = "country_name"